AIBase
Home
AI NEWS
AI Tools
AI Models
MCP
AI Services
AI Compute
AI Tutorial
EN

AI News

View More

Shanghai AI Lab Releases Open Source 'Shusheng・Wanjuan' 1.0 Multi-Modal Pre-trained Dataset

Shanghai AI Lab, in collaboration with the Corpus Data Alliance, has open-sourced the 'Shusheng・Wanjuan' 1.0 multi-modal pre-trained dataset, which includes text, images, and video datasets, totaling over 2TB. The dataset has undergone fine-grained cleaning and deduplication, featuring multi-dimensional integration, meticulous processing, and ease of use. The release of this open-source dataset will help promote the application and innovation of large models and lower the technical barriers associated with large model technologies.

6.7k 2 days ago
Shanghai AI Lab Releases Open Source 'Shusheng・Wanjuan' 1.0 Multi-Modal Pre-trained Dataset

Models

View More

Gemma 3n E2B Instructed

Google

Gemma 3n E2B Instructed

-

Input tokens/M

-

Output tokens/M

-

Context Length

AIBase
Empowering the future, your artificial intelligence solution think tank
English简体中文繁體中文にほんご
FirendLinks:
AI Newsletters AI ToolsMCP ServersAI NewsAIBaseLLM LeaderboardAI Ranking
© 2025AIBase
Business CooperationSite Map